fix: performance issue in interpretability notebooks #1238

memoryz · 2021-11-03T05:54:28Z

In the notebook, the background data should be broadcasted. When the explain_instances dataframe (observations to be explained) is in a mid range (50 to 100-ish), Spark will use a unexpected type of join plan, and messes up with the parallelization of the Kernel SHAP sampler, thus creating a performance bottleneck. Broadcasting the background dataset makes Spark respect the partitioning of the explain_instances dataframe.

These two notebooks both explain only 5 data points, so the performance bottleneck is not obvious. However, if we change 5 to 50, it becomes obvious. But if we further change it 500, Spark uses the intended join plan, and the bottleneck is not triggered.

I thought about forcing the broadcast inside the explainer, but this may create unexpected effect for other scenarios, so I'm hesitant to do so.

memoryz · 2021-11-03T05:54:37Z

/azp run

azure-pipelines · 2021-11-03T05:54:47Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov-commenter · 2021-11-03T06:00:46Z

Codecov Report

Merging #1238 (140641f) into master (81f5f80) will increase coverage by 0.20%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1238      +/-   ##
==========================================
+ Coverage   83.38%   83.59%   +0.20%     
==========================================
  Files         277      264      -13     
  Lines       13094    12919     -175     
  Branches      634      634              
==========================================
- Hits        10918    10799     -119     
+ Misses       2176     2120      -56

Impacted Files	Coverage Δ
...ython/synapse/ml/vw/VowpalWabbitRegressionModel.py
...ain/python/synapse/ml/vw/VowpalWabbitClassifier.py
vw/src/main/python/synapse/ml/vw/__init__.py
...c/main/python/synapse/ml/nn/ConditionalBallTree.py
.../main/python/synapse/ml/recommendation/SARModel.py
...main/python/synapse/ml/vw/VowpalWabbitRegressor.py
...thon/synapse/ml/vw/VowpalWabbitContextualBandit.py
...synapse/ml/vw/VowpalWabbitContextualBanditModel.py
...e/ml/recommendation/RankingTrainValidationSplit.py
...n/synapse/ml/vw/VowpalWabbitClassificationModel.py
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81f5f80...140641f. Read the comment docs.

fix: performance issue in interpretability notebooks

140641f

mhamilton723 approved these changes Nov 3, 2021

View reviewed changes

imatiach-msft approved these changes Nov 3, 2021

View reviewed changes

memoryz marked this pull request as ready for review November 3, 2021 07:44

memoryz merged commit 5733b85 into microsoft:master Nov 3, 2021

memoryz deleted the jasowang/notebook branch November 3, 2021 07:45

stuartleeks mentioned this pull request Dec 6, 2021

sl/flatten batch non array stuartleeks/SynapseML#1

Closed

stuartleeks mentioned this pull request Dec 16, 2021

Various improvements for TextAnalyze stuartleeks/SynapseML#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: performance issue in interpretability notebooks #1238

fix: performance issue in interpretability notebooks #1238

memoryz commented Nov 3, 2021 •

edited

Loading

memoryz commented Nov 3, 2021

azure-pipelines bot commented Nov 3, 2021

codecov-commenter commented Nov 3, 2021 •

edited

Loading

fix: performance issue in interpretability notebooks #1238

fix: performance issue in interpretability notebooks #1238

Conversation

memoryz commented Nov 3, 2021 • edited Loading

memoryz commented Nov 3, 2021

azure-pipelines bot commented Nov 3, 2021

codecov-commenter commented Nov 3, 2021 • edited Loading

Codecov Report

memoryz commented Nov 3, 2021 •

edited

Loading

codecov-commenter commented Nov 3, 2021 •

edited

Loading